New temporal features for robust speech recognition with emphasis on microphone variations

نویسندگان

  • Jia-Lin Shen
  • Wen L. Hwang
چکیده

Although the delta and RASTA methods have been widely used in extracting the temporal properties of stationary features for robust speech recognition, there is still a need to investigate new temporal features for better performance. In this paper, we present two new temporal features for robust processing of speech signals with emphasis on microphone variations. First, the temporal feature is derived from a bank of RASTA-like filters, in where the parameters of each filter in this bank are estimated according to the statistical properties of the speech signals. Secondly, a parameterized temporal filter (called PTF) is proposed. The filter can be described by four parameters, the passband, the beginning transition, the ending transition, and the smoothness of the magnitude of the filter response. These parameters altogether determine the magnitude of the frequency response of the PTF, and a transformation algorithm is then used to derive the temporal coefficients with real and causal characteristics. The discriminative ability of PTF features can be further enhanced using the minimum classification error (MCE) algorithm. Experimental results show that the RASTA features is inferior to the PTF features both in quiet condition and in the presence of microphone variations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

UNIVERSITY OF WEST BOHEMIA IN PILSEN DEPARTMENT OF CYBERNETIC Optimization of Features for Robust Speaker Recognition

Currently, the old feature extraction method, which was used early for speech recognition, is used in speaker recognition in our speaker recognition group. Standard Mell Frequency Cepstral Coefficients (MFCC) features are used. They can be extended by delta and acceleration coefficients eventually. Whereas features for speech recognition has been evolved and optimized until now, features for sp...

متن کامل

SRI November 1993 CSR Spoke Evaluation

In this paper we present SRI’s results on the 1993 ARPA CSR Spoke Evaluations. This evaluation used the same HMM acoustic models as those used in SRI’s hub system: gender-dependent Genonic HMM’s. The system was made robust by modifying the front end algorithms to estimate the cepstral features (the HMM models were not modified). The robust front-end used a wide bandwidth (100-6400Hz) and estima...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 13  شماره 

صفحات  -

تاریخ انتشار 1999